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Scientific  Progress 


In  this  last  year  of  the  grant,  we  have  focused  our  efforts  on  the  analysis  of  large  collections  of  GPS  traces,  with  goal  of 
extracting  shared  structure  in  the  collection  and  exploiting  that  to  improve  individual  trajectories  (e.g.,  in  matching  them  to  a 
map),  or  in  understanding  various  more  semantic  aspects  of  the  environment,  based  on  the  trajectories.  We  were  fortunate  to 
have  access  to  a  very  large  set  of  taxicab  traces  from  Beijing  for  this  work. 

I.  Pathlet  Learning  for  Trajectory  Compression  and  Planning 

The  pervasiveness  of  GPS  devices  has  created  many  large  datasets  of  pedestrian  and  vehicle  trajectories.  Compressing  such 
large  data  sets  is  obviously  of  interest.  Furthermore,  compression  is  tightly  coupled  with  shared  structure  extraction,  related  to 
the  semantics  of  trajectories.  Such  higher-level  trajectory  understanding  can  benefit  a  variety  of  applications  ranging  from  the 
study  of  population  migration  routes,  vehicular  traffic  patterns,  and  the  state  of  city  road  networks.  In  this  work,  we  have  sought 
to  extract  latent  shared  structure  among  many  human  trajectories. 

Our  work  is  motivated  by  the  seminal  work  of  Gonzalez  et  al,  who  discovered  that  human  trajectories  show  a  high  degree  of 
spatial  and  temporal  regularity,  i.e.,  human  beings  have  high  probability  of  repeating  similar  mobility  patterns.  We  seek  to  find  a 
set  of  path  segments  that  have  semantic  meanings,  referred  to  as  the  pathlet  dictionary,  out  of  which  most  trajectory  can  be 
reconstructed  by  concatenating  a  few  of  these  segments.  We  note  that  this  can  also  be  viewed  as  a  joint  trajectory 
segmentation  problem,  where  the  pathlet  structure  becomes  only  apparent  in  the  context  of  a  trajectory  collection. 

We  formulate  pathlet  learning  as  solving  an  integer  linear  program  (ILP),  whose  objective  function  minimizes  the  size  of  the 
pathlet  dictionary  as  well  as  the  number  of  pathlets  that  are  used  to  reconstruct  each  trajectory.  To  solve  this  ILP  on  large 
datasets,  we  introduce  a  decoupled  approach,  which  optimizes  a  lower  bound  of  the  original  objective  function.  We  have  tested 
our  algorithm  on  a  large-scale  real  world  dataset,  which  contains  230K  trajectories  of  taxi  cabs  in  Beijing.  Our  algorithm 
extracts  a  pathlet  dictionary  containing  around  130K  pathlets,  which  can  reconstruct  all  trajectories  in  the  dataset  using  7 
pathlets  on  average  per  trajectory.  This  number  is  significantly  smaller  than  the  average  number  of  edges  used  to  represent 
trajectories  on  a  road  map  (which  is  60.1 ),  or  the  average  number  of  “turns”  that  might  be  provided  in  a  navigation  system  to 
realize  the  trajectory  (which  is  36.7).  More  interesting  than  these  global  compression  statistics  is  the  semantic  information  one 
gleans  about  Beijing  traffic  through  the  pathlets.  For  example,  in  the  downtown  area,  pathlets  are  shorter,  corresponding  to 
smaller  commute  ranges,  while  for  the  region  near  the  airport  (located  in  a  suburban  area),  the  pathlets  are  longer, 
corresponding  to  long  trips  from  the  city  to  the  airport. 

Our  pathlet  dictionary  we  described  can  be  useful  not  only  for  trajectory  compression  but  in  trajectory  synthesis  applications  as 
well.  In  that  data  set,  frequently  used  pathlets  in  the  dictionary  represent  driving  segments  chosen  by  many  taxi  cab  drivers  in 
Beijing,  reflecting  the  joint  wisdom  of  a  highly  skilled  set  of  professionals.  To  show  the  usefulness  of  the  extracted  pathlets,  we 
implemented  a  route  planning  application  for  the  city  of  Beijing.  Experimental  results  are  competitive  against  those  obtained 
from  Google  Maps  and  at  times  superior. 

II.  Large-Scale  Joint  Map  Matching  of  GPS  Traces 

Map  matching  is  the  procedure  of  determining  the  path  of  a  user  on  a  map  from  a  sequence  of  GPS  positions  of  that  user  -  a 
trajectory.  This  procedure  finds  use  in  many  mobility  related  application,  such  as  urban  traffic  modeling,  dynamic  road  map 
generation,  and  mobility  pattern  mining.  Since  collecting  highly  accurate  GPS  traces  on  a  city  scale  is  quite  costly,  most  of  the 
trajectory  data  available  today  were  obtained  indirectly  through  GPS-equipped  vehicles  or  users  with  GPS-enabled  cellular 
phones.  The  majority  of  the  collected  trajectories  inevitably  contain  a  large  amount  of  uncertain  and  incomplete  information.  For 
example,  one  form  of  uncertainty  comes  from  GPS  noise,  which  is  particularly  severe  in  urban  environments  due  to  signals 
obstructed  by  or  reflected  off  buildings  (urban  canyons).  Incomplete  data  is  often  the  result  of  a  low  sampling  rate,  due  to 
limitations  on  storage  and  communication  bandwidth.  For  instance,  50%  of  the  Beijing  Taxi  Trajectories  we  employed  in  this 
study  have  at  most  one  sample  per  minute. 

Most  existing  map  matching  algorithms  take  a  single  GPS  trajectory  as  input.  We  refer  them  as  single-track  map  matching 
algorithms  (SMM).  They  are  typically  formulated  with  the  objective  of  minimizing  the  distance  between  the  projected  path  on  the 
map  and  the  input  trajectory,  and  of  achieving  some  other  regularization  objectives,  such  as  minimizing  the  length  of  the  path. 
These  algorithms  work  well  when  the  input  trajectory  is  densely  sampled  and  the  sampling  error  is  small.  However,  their 
performance  drops  significantly  when  the  input  trajectory  becomes  noisy  and  sparse.  In  this  case,  the  estimated  path  does  not 
necessarily  need  to  be  close  to  the  input  trajectory,  and  it  may  not  always  follow  or  approximate  the  shortest  path  on  the  map. 

In  our  work  we  address  these  issues  using  multi-track  map  matching,  i.e.  simultaneously  matching  a  collection  of  trajectories  to 
a  map.  The  advantage  of  this  approach  comes  again  from  the  observation  that  human  trajectories  show  a  "high  degree  of 
temporal  and  spatial  regularity”.  In  the  context  of  map  matching,  we  have  observed  large  amount  of  repeated  regular  structures 
in  vehicle  trajectories  despite  being  driven  by  different  drivers.  Hence  the  aim  of  multi-track  map  matching  is  to  recover  the 
regularity  patterns  (i.e.,  frequently  used  road  segments)  among  the  input  trajectories,  and  to  preserve  the  regularity  in  the 


matched  paths.  From  a  data-driven  perspective,  multi-track  map  matching  offers  additional  regularization  constraints  that 
improve  the  map  matching  results  of  individual  trajectories  —  effectively  using  the  "wisdom  of  the  collection"  to  compensate  for 
noise  and  gaps  in  individual  trajectories.  Specifically,  for  a  set  of  partially  overlapping  trajectories,  we  enforce  that  the  projected 
paths  of  their  overlapping  regions  coincide.  Such  a  formulation  implicitly  increases  the  sampling  density  of  trajectories. 
Moreover,  the  overlapping  parts  of  these  trajectories  are  jointly  determined,  which  improves  the  robustness  of  the  map 
matching  procedure.  The  multi-track  idea  was  used  in  earlier  work,  with  the  assumption  that  all  trajectories  are  sampled  from 
the  same  underlying  path.  Our  algorithm,  designed  to  apply  multi-track  map  matching  on  heterogeneous  data,  offers  more 
practical  use  on  large-scale  map  matching  applications  and  applies  (as  well  a  benefits)  from  path  diversity  in  the  collection. 

II.  Locating  Lucrative  Passengers  for  Taxicab  Drivers 

In  a  big  city  like  Beijing,  there  are  more  than  10,000  taxis  operating  every  day,  and  the  majority  of  taxi  passengers  find  their 
taxis  by  standing  beside  the  street  and  waiting  for  a  vacant  one  to  come  by.  Hence,  for  taxicab  drivers,  every  time  after  they 
drop  their  previous  passengers,  they  have  to  make  a  decision  about  where  to  search  for  the  next  passenger.  When  the 
objective  of  a  driver  is  to  maximize  the  daily  income  —  a  natural  strategy  is  to  search  within  an  “attractive"  area  where  the 
chance  of  finding  a  passenger  is  high.  However,  just  finding  any  passenger  is  not  sufficient:  taxi  drivers  prefer  long  trips,  since 
they  are  more  profitable.  On  the  other  hand,  long  trips  may  force  the  driver  into  a  remote  area  of  the  city  where  finding  the 
following  passenger  will  be  difficult.  As  every  taxi  driver  faces  this  problem  several  times  per  day,  he/she  develops  a  —  perhaps 
unconscious  —  strategy  based  on  personal  experience. 

In  this  work  we  posed  the  question  of  whether  a  good  strategy  for  finding  a  passenger  can  be  computed  from  GPS  trace  data; 
such  a  strategy  could  be  used,  for  instance,  as  a  basic  guideline  for  an  inexperienced  taxi  driver.  For  that  purpose,  we  model 
the  problem  of  finding  a  lucrative  passenger  as  a  Markov  Decision  Process  (MDP).  All  parameters  of  the  MDP  are  obtained  by 
analyzing  a  collection  of  GPS  data  of  1 000  taxis  over  one  month  in  Beijing,  China.  We  compute  an  optimal  policy  for  the  MDP 
using  dynamic  programming;  that  policy  can  be  represented  as  a  directed  acyclic  graph  (DAG)  where  an  edge  from  location  A 
to  location  B  represents  a  recommendation  to  drive  from  A  to  B  when  looking  for  a  passenger.  In  particular,  a  sink  in  the  DAG  is 
a  locally  optimal  area  to  find  a  pickup  and  the  taxi  should  stay  at  this  location.  We  compute  such  policies  for  weekday  daylight 
hours,  weekday  night  hours,  and  weekends,  demonstrating  that  the  policies  are  changing  in  a  meaningful  way.  We  validate  the 
computed  policies  using  the  same  GPS  data:  we  identify  instances  where  the  search  path  of  the  taxi  drivers  agrees  with  our 
proposed  policy  and  show  that  such  instances  generate  more  income  than  the  average  trip. 

As  this  example  shows,  when  we  analyze  and  understand  trajectory  collections,  we  end  up  with  more  than  just  a  tool  for 
compressing  or  parametrizing  trajectories.  Trajectories,  besides  being  of  interest  in  themselves,  are  also  a  tool  for 
understanding  the  environment  in  which  the  motions  took  place  and  the  mobile  entities  that  traversed  or  generated  them.  A 
large  collection  of  taxicab  GPS  traces  from  Beijing  tells  us  a  lot  about  the  road  structure  of  the  city,  its  population  hubs  and  how 
people  move  between  them,  and  even  about  the  capabilities  of  the  driven  vehicles  and  proclivities  of  the  drivers  themselves. 

Technology  Transfer 


